Search Result

Select

Knowledge base question answering system based on multi-feature semantic matching

ZHAO Xiaohu, ZHAO Chenglong

Journal of Computer Applications 2020, 40 (7): 1873-1878. DOI: 10.11772/j.issn.1001-9081.2019111895

Abstract （467）

PDF （880KB）（691）

Save

The task of Question Answering over Knowledge Base (KBQA) mainly aims at accurately matching natural language question with triples in the Knowledge Base (KB). However, traditional KBQA methods usually focus on entity recognition and predicate matching, and the errors in entity recognition may lead to error propagation and thus fail to get the right answer. To solve the above problem, an end-to-end solution was proposed to directly match the question and triples. This system consists of two parts:candidate triples generation and candidate triples ranking. Firstly, the candidate triples were generated by the BM25 algorithm calculating the correlation between the question and the triples in the knowledge base. Then, Multi-Feature Semantic Matching Model (MFSMM) was used to realize the ranking of the triples, which means the semantic similarity and character similarity were calculated by MFSMM through Bi-directional Long Short Term Memory Network (Bi-LSTM) and Convolutional Neural Network (CNN) respectively, and the triples were ranked by fusion. With NLPCC-ICCPOL 2016 KBQA as the dataset, the average F1 of the proposed system is 80.35%, which is close to the existing best performance.

Reference | Related Articles | Metrics

Select

Paging-measurement method for virtual machine process code based on hardware virtualization

CAI Mengjuan, CHEN Xingshu, JIN Xin, ZHAO Cheng, YIN Mingyong

Journal of Computer Applications 2018, 38 (2): 305-309. DOI: 10.11772/j.issn.1001-9081.2017082167

Abstract （431）

PDF （1037KB）（537）

Save

In cloud environment, the code of pivotal business in Virtual Machine (VM) can be modified by malicious software in many ways, which can pose a threat to its stable operation. Traditional measurement systems based on host are liable to be bypassed or attacked. To solve the problem that it is difficult to obtain a complete virtual machine running process code and verify its integrity at Virtual Machine Monitor (VMM) layer, a paging-measurement method based on hardware virtualization was proposed. The Kernel-based Virtual Machine (KVM) was used as the VMM to capture the system calls of virtual machine process in VMM and regarde it as the trigger point of the measurement process; the semantic differences of different virtual machine versions were solved by using relative address offset, then the paging-measurement method could verify the code integrity of running process in virtual machine transparently at VMM layer. The implemented prototype system of VMPMS (Virtual Machine Paging-Measurement System) can effectively measure the virtual machine process code with acceptable performance loss.

Reference | Related Articles | Metrics

Select

Improved K-means clustering algorithm based on multi-dimensional grid space

SHAO Lun, ZHOU Xinzhi, ZHAO Chengping, ZHANG Xu

Journal of Computer Applications 2018, 38 (10): 2850-2855. DOI: 10.11772/j.issn.1001-9081.2018040830

Abstract （405）

PDF （828KB）（283）

Save

K-means algorithm is a widely used clustering algorithm, but the selection of the initial clustering centers in the traditional K-means algorithm is random, which makes the algorithm easily fall into local optimum and causes instability in the clustering result. In order to solve this problem, the idea of multi-dimensional grid space was introduced to the selection of initial clustering center. Firstly, the sample set was mapped to a virtual multi-dimensional grid space structure. Secondly, the sub-grids containing the largest number of samples and being far away from each other were searched as the initial cluster center grids in the space structure. Finally, the mean points of the samples in the initial cluster center grids were calculated as the initial clustering centers. The initial clustering centers chosen by this method are very close to the actual clustering centers, so that the final clustering result can be obtained stably and efficiently. By using computer simulation data set and UCI machine learning data sets to test, both the iterative number and error rate of the improved algorithm are stable, and smaller than the average of the traditional K-means algorithm. The improved algorithm can effectively avoid falling into local optimum and guarantee the stability of clustering result.

Reference | Related Articles | Metrics

Select

Virtual machine file integrity monitoring based on hardware virtualization

ZHAO Cheng, CHEN Xingshu, JIN Xin

Journal of Computer Applications 2017, 37 (2): 388-391. DOI: 10.11772/j.issn.1001-9081.2017.02.0388

Abstract （698）

PDF （807KB）（580）

Save

In order to protect the integrity of the Virtual Machine (VM) sensitive files and make up for the shortcomings such as high performance overhead, low compatibility and poor flexibility in out-of-box monitoring based on the instruction monitoring methods, OFM (Out-of-box File Monitoring) based on hardware virtualization was proposed. In OFM, Kernel-based Virtual Machine (KVM) was used as the virtual machine monitor to dynamically configure sensitive file access control strategy in real-time; in addition, OFM could modify the call table entries related to file operations of virtual machine system to determine the legitimacy of the VM process operation files, and deal with the illegal processes. Unixbench was deployed in a virtual machine to test the performance of OFM. The experimental results demonstrate that OFM outperforms to instruction monitoring methods in file monitoring and has no affect on other types of system calls for virtual machines. Meanwhile, OFM can effectively monitor the integrity of the virtual machine files and provide better compatibility, flexibility and lower performance losses.

Reference | Related Articles | Metrics

Select

Achievable information rate region of multi-source multi-sink multicast network coding

PU Baoxing, ZHU Hongpeng, ZHAO Chenglin

Journal of Computer Applications 2015, 35 (6): 1546-1551. DOI: 10.11772/j.issn.1001-9081.2015.06.1546

Abstract （464）

PDF （915KB）（369）

Save

In order to solve the problem of multi-source multi-sink multicast network coding, an algorithm for computing achievable information rate region and an approach for constructing linear network coding scheme were proposed. Based on the previous studies, the multi-source multi-sink multicast network coding problem was transformed into a specific single-source multicast network coding scenario with a constraint at the source node. By theoretical analyses and formula derivation, the constraint relationship among the multicast rate of source nodes was found out. Then a multi-objective optimization model was constructed to describe the boundary of achievable information rate region. Two methods were presented for solving this model. One was the enumeration method, the other was multi-objective optimization method based on genetic algorithm. The achievable information rate region could be derived from Pareto boundary of the multi-objective optimization model. After assigning the multicast rate of source nodes, the linear network coding scheme could be constructed by figuring out the single-source multicast network coding scenario with a constraint. The simulation results show that the proposed methods can find out the boundary of achievable information rate region including integral points and construct linear network coding scheme.

Reference | Related Articles | Metrics

Select

Tradeoff between multicast rate and number of coding nodes based on network coding

PU Baoxing, ZHAO Chenglin

Journal of Computer Applications 2015, 35 (4): 929-933. DOI: 10.11772/j.issn.1001-9081.2015.04.0929

Abstract （530）

PDF （800KB）（9241）

Save

Based on single-source multicast network coding, in order to explore the relationship between multicast rate and the number of minimal needed coding nodes, by employing the technique of generation and extension of linear network coding, theoretical analysis and formula derivation of the relationship were given. It is concluded that the number of the minimal needed coding nodes monotonously increases with the increasing of multicast rate. A multi-objective optimization model was constructed, which accurately described the quantitative relationship between them. For the sake of solving this model, a search strategy was derived to search all feasible coding schemes. By combining the search strategy with NSGA-II, an algorithm for solving this model was presented. In the case of being required to consider the tradeoff between them, the solution of the model is the basis of choice for determining network coding scheme. The proposed algorithm not only can search whole Pareto set, but also search part Pareto set related with certain feasible multicast rate region given by user with less search cost. The simulation results verify the conclusion of theoretical analysis, and indicate that the proposed algorithm is feasible and efficient.

Reference | Related Articles | Metrics